Final Project

Group 1:

  • Nathania Stephens

  • Hiba Awan

Abstract

Introduction & Background

Motivation/ Purpose

Goals/ Objectives

Data

Overview

About the Data

Three datasets were used from to acquire arrest, citations, and warnings in the year 2023 from the Fairfax County Policy Department. For simplicity general definition are provided:

Arrest - When a person is taken into custody to answer for an offense or when there is a deprivation or restraint of a person’s liberty in any significant way.

Citation - Formal notice issued by law enforcement officer for a violation of law, typically related to traffic laws or other minor offenses. Typically requiring a violator to appear in court or pay a fine.

Warning - When a violation, typically minor, has been made but an officer issues a warning rather than a citation.

The following attributes were key to the research and conducted:

Column Name Data Type Description
Date Date Date of offense
Time Chr 123
Offense 1 1

Limitations and Assumptions

Cleaning and Transformation

Exploratory Analysis

Mapping the arrest data for a geospatial visual of where arrest occur.

Next we look at the Top 10 Arrest Type by Incident Based Reporting (IBR) codes.

Next examining the Top 10 Citations

Warning Versus Citation Next an examination of warning versus citation will be observed… This will help understand what different factors could play into getting a warning or a citation.

library(readr)
library(lubridate)

warnings = read_csv("2023_warning_data.csv", 
                         col_types = cols(Warnings_Date = col_date(format = "%m/%d/%Y"), 
                                          WEB_ADDRESS = col_skip(), PHONE_NUMBER = col_skip(), 
                                          NAME = col_skip()))

citations = read_csv("2023_citation_data.csv", 
                                col_types = cols(Date = col_date(format = "%m/%d/%Y"), 
                                                 WEB_ADDRESS = col_skip(), PHONE_NUMBER = col_skip(), 
                                                 NAME = col_skip()))

# Rename some columns
citations = citations %>%
  rename(ViolationDate = Date)

# change Gender to sex in warnings and change date column name
warnings = warnings %>%
  rename(Sex = Gender)

warnings = warnings %>%
  rename(ViolationDate = Warnings_Date)

# Adjust Citations and prepare for Merge
# Assumption that ID is the officer's ID
citations_processed = citations %>%
  mutate(
    outcome = "Citation",
    Gender = case_when(
      Sex == "M" ~ "Male",
      Sex == "F" ~ "Female",
      TRUE ~ "Other/Unknown"
    ),
    Year = year(ViolationDate),
    Month = month(ViolationDate),
    DayOfMonth = day(ViolationDate),
    Time = parse_date_time(Time, "HM"),
    data_type = "Citation"
  ) %>%
  select(
    outcome, Gender, Year, Month, DayOfMonth, Time, Offense_Description = Charge,
    District = DISTRICT, Race, Ethnicity, Latitude, Longitude, OfficerID = ID, data_type
  )

# Adjust Warnings and prepare for Merge
warnings_processed = warnings %>%
  mutate(
    outcome = "Warning",
    Gender = case_when(
      Sex == "M" ~ "Male",
      Sex == "F" ~ "Female",
      TRUE ~ "Other/Unknown"
    ),
    Year = year(ViolationDate),
    Month = month(ViolationDate),
    DayOfMonth = day(ViolationDate),
    Time = parse_date_time(Time, "HM"),
    data_type = "Warning"
  ) %>%
  select(
    outcome, Gender, Year, Month, DayOfMonth, Time, Offense_Description, District = DISTRICT, Race, 
    Ethnicity, Latitude = Lat, Longitude = Long, OfficerID = Officer_ID, data_type
  )

# Combined for ultimate Data coordination!
combined_wc = bind_rows(citations_processed, warnings_processed)

# Add ultimate binary outcome! 0 = Citation, 1 = Warning/ Got out of ticket
combined_wc = combined_wc %>%
  mutate(
    BinaryOutcome = ifelse(outcome == "Warning", 1,0)
  )

## Change to Title Case for District Names
combined_wc$District = tools::toTitleCase(tolower(combined_wc$District))

## Examining Unverified data
## After examination, unverified only makes up 0.0143 or 1.43% of the data set, so we will remove
## because it is a very small portion of the total proportion. 
combined_wc %>%
  count(District) %>%
  mutate(Proportion = n / sum(n)) %>%
  arrange(desc(n))
# A tibble: 11 × 3
   District         n Proportion
   <chr>        <int>      <dbl>
 1 Sully        18612  0.208    
 2 Springfield  12581  0.140    
 3 Braddock     10292  0.115    
 4 Franconia    10033  0.112    
 5 Hunter Mill   8718  0.0972   
 6 Mason         8168  0.0911   
 7 Dranesville   7143  0.0797   
 8 Providence    6713  0.0749   
 9 Mount Vernon  6113  0.0682   
10 Unverified    1281  0.0143   
11 <NA>             1  0.0000112
## Filter out Unverified and NA
combined_wc = combined_wc %>%
  filter(District != "Unverified")

combined_wc = combined_wc %>%
  filter(!is.na(District))

## Filter out Other/Unknown Gender
combined_wc_mf = combined_wc %>%
  filter(Gender != "Other/Unknown")

## Now for some visuals: Gender Chart
## Examining the proportion of stops resulting in a Warning Vs Citation
## the Warning rate is the proportion of incidents that are warnings.
gender_warning_rate = combined_wc_mf %>%
  group_by(Gender) %>%
  summarise(
    Total_Incidents = n(),
    Warning_Rate = mean(BinaryOutcome)
  ) %>%
  ungroup()

gender_chart = ggplot(gender_warning_rate,
                      aes(x = Gender, y = Warning_Rate, fill = Gender)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = scales::percent(Warning_Rate, accuracy = 0.1)),
            vjust = -0.5, size = 5) +
  scale_y_continuous(labels = scales::percent, limits = c(0, max(gender_warning_rate$Warning_Rate) * 1.1)) +
  labs(
    title = "Warning Rate by Gender",
    subtitle = "Proportion of stops resulting in a Warning (vs Citation)",
    x = "Gender",
    y = "Warning Rate"
  ) + theme_gray() + theme(plot.title = element_text(hjust = 0.5)) + theme(plot.subtitle = element_text(hjust = 0.5)) +
  scale_fill_manual(values = c("Female" = "pink", "Male" = "skyblue"))

gender_chart

## Now the Chi-Squared Test starting with the Contingency Table
contingency_tbl = combined_wc_mf %>%
  filter(Gender %in% c("Male", "Female")) %>%
  select(Gender, BinaryOutcome) %>%
  table()

contingency_tbl
        BinaryOutcome
Gender       0     1
  Female 20478  8777
  Male   43657 15408
chi_sq_results = chisq.test(contingency_tbl)

chi_sq_results

    Pearson's Chi-squared test with Yates' continuity correction

data:  contingency_tbl
X-squared = 150.62, df = 1, p-value < 2.2e-16

Research Questions

  1. Is there an association between gender and warnings?

  2. Are there other factors that determine if someone gets out of a “ticket”? OR Are you more likely to get a ticket at the end of the month (some believe that police officers have a monthly quota)

Conclusion

References